首页> 外文OA文献 >Quality of Word Embeddings on Sentiment Analysis Tasks
【2h】

Quality of Word Embeddings on Sentiment Analysis Tasks

机译:情感分析任务中词嵌入的质量

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

Word embeddings or distributed representations of words are being used in various applications like machine translation, sentiment analysis, topic identification etc. Quality of word embeddings and performance of their applications depends on several factors like training method, corpus size and relevance etc. In this study we compare performance of a dozen of pretrained word embedding models on lyrics sentiment analysis and movie review polarity tasks. According to our results, Twitter Tweets is the best on lyrics sentiment analysis, whereas Google News and Common Crawl are the top performers on movie polarity analysis. Glove trained models slightly outrun those trained with Skipgram. Also, factors like topic relevance and size of corpus significantly impact the quality of the models. When medium or large-sized text sets are available, obtaining word embeddings from same training dataset is usually the best choice.
机译:词嵌入或词的分布式表示形式被用于各种应用中,例如机器翻译,情感分析,主题识别等。词嵌入的质量及其应用的性能取决于训练方法,语料库大小和相关性等几个因素。在本研究中我们比较了十二种预训练单词嵌入模型在歌词情感分析和电影评论极性任务上的性能。根据我们的结果,Twitter Tweets在歌词情感分析上是最好的,而Google News和Common Crawl在电影极性分析上表现最好。手套训练的模型略胜于使用Skipgram训练的模型。同样,诸如主题相关性和语料库大小之类的因素也会显着影响模型的质量。当有中型或大型文本集可用时,从相同的训练数据集中获取单词嵌入通常是最佳选择。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号